Very important pharmacogenetic variants landscape and potential clinical relevance in the Zhuang population from Yunnan province

The gradual evolution of pharmacogenomics has shed light on the genetic basis for inter-individual drug response variations across diverse populations. This study aimed to identify pharmacogenomic variants that differ in Zhuang population compared with other populations and investigate their potential clinical relevance in gene-drug and genotypic-phenotypic associations. A total of 48 variants from 24 genes were genotyped in 200 Zhuang subjects using the Agena MassARRAY platform. The allele frequencies and genotype distribution data of 26 populations were obtained from the 1000 Genomes Project, followed by a comparison and statistical analysis. After Bonferroni correction, significant differences in genotype frequencies were observed of CYP3A5 (rs776746), ACE (rs4291), KCNH2 (rs1805123), and CYP2D6 (rs1065852) between the Zhuang population and the other 26 populations. It was also found that the Chinese Dai in Xishuangbanna, China, Han Chinese in Beijing, China, and Southern Han Chinese, China showed least deviation from the Zhuang population. The Esan in Nigeria, Gambian in Western Division, The Gambia, and Yoruba in Ibadan, Nigeria exhibited the largest differences. This was also proved by structural analysis, Fst analysis and phylogenetic tree. Furthermore, these differential variants may be associated with the pharmacological efficacy and toxicity of Captopril, Amlodipine, Lisinopril, metoclopramide, and alpha-hydroxymetoprolol in the Zhuang population. Our study has filled the gap of pharmacogenomic information in the Zhuang population and has provided a theoretical framework for the secure administration of drugs in the Zhuang population.


Study subjects
In total, 200 unrelated Zhuang subjects (110 females and 90 males) were recruited from Wen Shan in the Yunnan Province of China.The sample size and the proportion were determined using G*Power 3.1.9.2 software 16 .The participants were healthy based on their medical history and physical examination.Additionally, they had at least three generations of Zhuang ancestry, while none of the other populations had any known ancestral background.Subjects with chronic diseases, infectious diseases, drug or alcohol abuse, severe heart, liver or kidney dysfunction, immune disorders, pregnancy, and lactation were excluded.The informed consent forms have been signed by all subjects.According to the study protocol approved by the Clinical Research Ethics Committee of Northwest University, 5 mL of peripheral blood was collected from each subject and stored at 4 °C for 24 h.

Variants selection and genotyping
Through an extensive literature review on drug metabolism and toxicity, we identified 24 genes associated with these phenomena.By utilizing resources such as the PharmGKB database, the SNP database of NCBI, and the International HapMap Project, in addition to relevant pharmacogenomics literature, we selected variants linked to drug therapy responsiveness.A preliminary screening identified 59 variants.However, only homozygous genotypes were observed for 11 of these variants, making it impossible to compare the distribution and differences in genotype frequency.Consequently, these 11 variants were excluded from our analysis, leaving 48 variants for further investigation.
Genomic DNA was extracted from participants' peripheral blood using GoldMag-Mini Whole Blood Genomic DNA Purification Kit (GoldMag Ltd., Xi'an, China).The concentration of genomic DNA was measured using NanoDrop 2000C spectrophotometer (Thermo Scientific, Waltham, MA, USA).Subsequently, multiplexed SNV MassEXTEND assays were designed using Agena MassARRAY Assay Design 4.0 software (San Diego, California, USA), which allowed for the design of PCR primers for the selected VIP variants.Agena MassARRAY RS1000 (San Diego, California, USA) was able to genotype the 48 VIP variants according to the manufacturer's instructions 17 .Finally, the data of SNV genotypes were collected and managed using Agena Typer 4.0 software 18 , as mentioned in previous studies.

Structure analysis and Fst analysis
The Structure 2.3.4 software was used to analyze the structure of 27 populations, and Arlequin3.1 software was used to evaluate pairwise Fst values for assessing the relationship between 27 population groups.In addition, MEGA11 software was utilized to plot phylogenetic tree.

Statistical analysis
The data were compiled, ordered, and analyzed using Microsoft Excel 2019 (Microsoft, Redmond, WA, USA) and SPSS 26.0 (SPSS, Chicago, IL, USA).The χ 2 test was utilized to estimate the Hardy-Weinberg equilibrium (HWE) and compare the divergences in genotype frequencies of 48 VIP variants between the Zhuang population and the other 26 populations.All statistical tests were two-tailed (p < 0.05).Bonferroni corrections were performed to determine the significance level.After the Bonferroni's multiple tests, p < 4.01 × 10 −5 was recognized as statistically significant.

Ethics approval
This study was conducted by the World Medical Association Declaration of Helsinki and was approved by the Northwestern University Clinical Research Ethics Committee (Approval number of Ethics Committee: 230,413,002).All subjects signed an informed consent form.

Basic characteristics of candidate VIP variants
The 48 VIP variants on 24 genes that satisfied the HWE equation (p > 0.05) were collected in this study.Table 1 summarizes the fundamental characteristics of these variants, including gene name, SNVs ID, position, functional consequence, genotype frequency, and minor allele frequency (MAF) in the Zhuang population.Additionally, Table S1 shows the PCR primers for the gathered VIP variants.

Genetic structure analysis of 27 populations
A model-based clustering approach was used to analyze the genetic structure of the 27 populations distributed in Africa, America, East Asia, Europe and South Asia to further analyze their relationship.Based on the Structure 2.3.1 Software, different K values ranging from 5 to 8 were hypothetically considered in structure analysis.When K = 5, the groups were divided into 5 subgroups based on the relative majority probability of assigning individuals to subgroups (subgroup 1: GWD and LWK; Subgroup 2: BEB, CEU, FIN, GBR, IBS, TSI, CLM, MXL and PUR; Subgroup 3: Zhuang, CDX, CHB, CHS, JPT, KHV and PUR; Subgroup 4: GIH, ITU, PJL and STU; Subgroup 5: ACB, ASW, ESN, MSL and YRI).It can be observed from Fig. 2 that Zhuang population have a stronger affinity with CDX, CHB, CHS, JPT, KHV and PUR.This is consistent with the results in Table 2.
The pairwise Fst values were used to assess relationships among 27 populations, as shown in Table 3 and Fig. 3A.The Fst values between the Zhuang population and the East Asian population (CDX, CHB, CHS, JPT and KHV) were small, which were 0.065, 0.068, 0.066, 0.073 and 0.067, respectively (Table 3).Smaller Fst values indicate closer relationships between the two groups and suggest that they share similar genetic backgrounds.The result is confirmed by the phylogenetic trees of 27 populations shown in Fig. 3B.

Genotype frequencies of four significantly different SNVs
Moreover, the genotype frequency distribution of rs776746 (CYP3A5), rs4291 (ACE), rs1805123 (KCNH2), and rs1065852 (CYP2D6) in 26 populations are shown in Fig. 4. The genotype frequency of rs4291-AT in the Zhuang population is remarkably higher than that of the other 26 populations.The CC genotype frequency of rs776746 is similar to that of EUR and significantly higher than that of AFR.In the Zhuang population, the frequency of the www.nature.com/scientificreports/rs1805123-GG genotype is notably higher compared to that observed in the other 26 populations.The frequency of rs1065852-GG is similar to that of KHV, CHS, CHB, and CDX, and lower than that of other populations.

MAF distribution of four significantly different SNVs
Based on the allele frequencies calculated in this study, we plotted a map of the MAF distribution of VIP variants that substantially differed from the other 26 populations.According to Fig. 5, the allele frequency of rs776746 at CYP3A5 in the Zhuang population was similar to that in the European population, despite their close genetic affinity with East Asians.The G allele of rs1805123 at KCNH2 was nearly fixed in the Zhuang population and had a low frequency in other global populations.The MAF of rs1065852 (CYP2D6) was similar to that of East Asians and higher than other populations.However, there were no significant differences in T allele frequency for rs4291 at ACE among different populations.

Clinical relevance of significant variants
The Table 4 presents the clinical annotation information of the VIP variants in PharmGKB.The genotype frequency of rs776746 (CYP3A5) has been shown to have an impact on the dose, toxicity and metabolism of tacrolimus [19][20][21] .Specific mutations in rs4291 (ACE) have been implicated in the metabolism of anti-hypertensive drugs such as amlodipine, sodium chlorthalidone and lisinopril 22 .Furthermore, they also influenced the risk of aspirin intolerance in asthmatics exposed to aspirin 23 .The efficacy of metoclopramide in patients with gastric disease was found to be associated with rs1805123 (KCNH2) 24 .Rs1065852 played an indispensable role in the regulation of the α-hydroxymetoprolol metabolism in patients with non-small cell lung cancer 25 .

Prediction of functional damage in proteins
Subsequently, we used the PolyPhen-2, SNAP2, FATHMM, Mutationtaster, and Mutationassessor online databases to predicte whether the four SNVs would affect protein structure and function (Table 5).The results indicated that rs1805123 would cause a mutation from K to T at position 897 of KCNH2, however, this mutation was considered benign and less harmful to the protein in most databases.In contrast, rs1065852 caused a mutation from P to A at the 34th position of CYP2D6.The database predicted that this change would severely impair the protein's function and potentially contribute to certain diseases.Additionally, Chimera v1.16 was utilized for predicting the structure of point mutations of CYP2D6 and KCNH2, as shown in Fig. 6.

Discussion
During the development of biological sciences, it has gradually been realized that genetic differences between populations have an essential influence on drug metabolism, dosages and ADRs.This can potentially affect the efficacy of certain medications in specific populations.Pharmacogenomics research is gradually illuminating the genetic factors responsible for variations in drug utilization among diverse populations.For instance, an Vol:.( 1234567890) important study conducted by Wen et al. demonstrated that there were significant differences in allele frequencies of key genetic variants affecting drug selection and dosing between Hmong and East Asian populations 26 .Furthermore, pharmacogenomics studies have been reported on Mongolian 27 , Tibetan 28 , and Blang 29 , among others.However, there have been few pharmacogenomics studies conducted on the Zhuang population.
In this study, 200 Zhuang subjects in Yunnan Province were recruited and genotyped for 48 VIP variants on 24 candidate genes.The genotypic distribution was compared to that of 26 populations from the 1000G dataset.The results revealed significant differences in CYP3A5 (rs776746), ACE (rs4291), KCNH2 (rs1805123), and CYP2D6 (rs1065852) between Zhuang population and the other 26 populations.We also used the PharmGKB database to annotate significantly different SNVs.Our study on VIP polymorphism in the Zhuang population may provide tailored therapy for the Zhuang population.
The cytochrome P450 (CYP) superfamily is an ancient enzyme family found in hundreds of eukaryotic and prokaryotic organisms 30 .The human genome encodes 57 putative functional CYP genes, as well as 58 pseudogenes.Among these 57 functional human CYPs, 12 are involved in the metabolism of 70-80% commonly used drugs, including CYP2D6 and CYP3A5 31 .The human CYP2D6 gene is relatively short, spanning only about 4.3 Kbps on the long arm of chromosome 22 (22q13.2).The CYP2D6 gene is composed of 9 exons and encodes the CYP2D6 protein, which is localized in the endoplasmic reticulum.This protein exhibited highly  expressed levels in the liver, brain, intestinal tissues, and lymphocytes 32 .There were large population differences in the distribution of CYP2D6 alleles, which could lead to variations in drug utilization among different populations 33 .rs1065852 has been reported to result in reduced protein stability and a poor response to drugs such as iloperidone, atorvastatin, antidepressants, and antipsychotics [34][35][36][37] .Moreover, allele A was associated with decreased clearance of alpha-hydroxymetoprolol in healthy individuals and a higher plasma concentration of S-didesmethyl-citalopram when treated with citalopram or escitalopram in people with Depressive Disorder compared to allele G 25,38 .In this study, the MAF of rs1065852-A (54.80%) was higher than that in SAS (16.70%),EUR (20.30%),AFR (11.60%), and AMR (14.60%).In addition, the frequency of rs1065852-GG was lower than that in other populations except for EAS.Therefore, the differences in drug efficacy and safety caused by CYP2D6 rs1065852 should be taken into consideration in the Zhuang population.CYP3A5, which is located in chromosome 7q21.1, is involved in the metabolism of many drugs.Tacrolimus, one of the substrates of CYP3A5, is widely used as an immunosuppressive agent for organ transplantation 39 .The expression of CYP3A5 varied among different populations, which may have an impact on drug metabolism in those populations 21 .One study identified that genotype CT was associated with a higher tacrolimus dose in renal transplant patients compared to genotype CC 21 .The results of Flores-Pérez et al. revealed that critically ill Mexican pediatric patients with the CYP3A5*3 allele variant (rs776746) had increased plasma levels of midazolam and higher drug clearance 3 h after the end of the infusion compared to carriers with the normal allele 40 .The study by Liang et al. pointed out that individuals with the rs776746-CC had an increased risk of amlodipine-induced peripheral edema in a dominant model among Chinese Han hypertensive patients 41 .In our study, the frequencies of CT, TT and CC of rs776746 were 0.5%, 9.5% and 90.0%, separately.The frequency of CT genotype in the Zhuang was lower than that in the other 26 populations, highlighting the importance of considering metabolism and absorption of specific drugs in the Zhuang population.
KCNH2 is a gene that encodes a component of voltage-activated potassium channel found in cardiac muscle, neuronal cells, and microglia.Four copies of this protein interact with a copy of the KCNE2 protein to form a functional potassium channel.Mutations in this gene can lead to long QT syndrome type 2 (LQT2) 42 .A recent study has identified KCNH2 p.Gly262AlafsTer98 as a novel pathogenic variant associated with long QT syndrome in a Spanish population 43 .In a separate study, it was found that KCNH2 mutations cause fetal biventricular densified cardiomyopathy with pulmonary stenosis and bradycardia 44 .The efficacy of metoclopramide in patients with gastric disease was found to be correlated with the polymorphism of KCNH2 gene (rs1805123, p = 0.020) 24 .A Marjamaa et al. found that allele G of rs1805123 was associated with a shorter QT interval in a Finnish population compared to the TT genotype 45 .In our study, the frequency of rs1805123-G was significantly higher in the Zhuang population than in the other 26 populations (88.70%).The rs1805123 causes a K-T mutation at  site 897 of KCNH2.Although most databases predict this mutation to be benign, attention should be paid to the shorter/long QT interval and the dose of metoclopramide in the Zhuang population.ACE, which encodes an enzyme, is known to participate in the regulation of blood pressure and electrolyte balance.Numerous studies have shown that ACE is closely associated with nervous system diseases 46,47 , cardiovascular diseases 48 , and hypertension 49,50 .In a previously published study, we found that the rs4291 genotype influenced drug dosing in the treatment of the disease.De Oliveira et al. found a correlation between the use of brain-penetrating angiotensin converting enzyme inhibitors (ACEIs) (such as captopril or perindopril) in antihypertensive therapy and rs4291 46 .Another study has confirmed that the AA genotype of rs4291, compared to the genotype AT + TT, is associated with a reduced severity of renal failure in patients with Alzheimer's disease treated with captopril 51 .Furthermore, rs1800764 and rs4291 also formed haplotypes.A study discovered that ACEIs decelerated cognitive decline in individuals carrying the ACE haplotype with rs1800764-T and rs4291-A, as well as those carrying the APOE4 haplotype with either rs1800764-T or rs4291-T, regardless of changes in blood pressure 52 .Our study demonstrated that the allele frequency of rs4291 (ACE) differed significantly between the Zhuang population and the other 26 populations, which has been found to be associated with drug metabolism for various diseases, such as captopril, aspirin, and amlodipine.It may provide guidance for precision drug administration in the Zhuang population.
Our results are likely to complement the pharmacogenomic information of the Zhuang population and refine the study on the differences between the Zhuang population and the other 26 populations.More importantly, this study may provide certain theoretical support for drug use in the Zhuang population.Nonetheless, there are some limitations to our study.Our sample size was relatively small in this study.To design a comprehensive, systematic, disease-specific treatment protocol for the Zhuang population, we need to further expand the sample size for more in-depth studies.In addition, only Agena MassARRAY was used for genotyping in this study, and no other orthogonal method was employed to validate the sequencing, which will be utilized in subsequent studies.

Conclusion
In short, the genotype frequencies of CYP3A5 (rs776746), ACE (rs4291), CYP2D6 (rs1065852), and KCNH2 (rs1805123) showed significant disparities between the Zhuang population and 26 other populations.Our study can not only enrich the pharmacogenomics database of the Zhuang population but also provide a theoretical basis for tailored therapy in this population and ensure safe drug use for patients.

Figure 1 .
Figure 1.The amount of difference variants between the Zhuang population and 26 populations.The size of the rectangle indicates the number of different variants between the Zhuang population and the other 26 populations from five regions.

Figure 2 .
Figure 2. Structure analysis of the genetic relationship between the Zhuang population and the other 26 populations.K denotes the possible numbers of parental population clusters.Each vertical bar represents a sample, dividing into color sections.K = 5 were utilized to evaluate the relationship between Zhuang and 26 populations.

Figure 3 .
Figure 3. Fst value heamap and phylogenetic tree among 27 populations.(A) Heatmap based on the pairwise Fst values between 27 populations.(B) The phylogenetic tree was constructed by the neighboring-joining method among 27 populations.

Figure 4 .
Figure 4.The distribution of genotype frequencies for significantly different SNVs in 27 populations at the rs776746, rs4291, rs1805123 and rs1065852.

Figure 5 .
Figure 5.The map of the allele frequency distribution for significantly different SNVs in 27 populations at the rs1065852, rs4291, rs776746 and rs1805123.

Figure 6 .
Figure 6.Structural prediction of point mutated proteins.(A) 3D structure of the CYP2D6 protein, with the yellow part being a SNV.(B) rs1065852 mutated local structure.(C) 3D structure of the KCNH2 protein, with the yellow part being a SNV mutation.(D) rs1805123 mutated local structure.

Table 1 .
Basic information of 48 selected VIP variants in the Zhuang population.SNVs: single nucleotide variants, Chr: chromosome, BP: base pairs, ID: identity documents, MAF: minor allele frequency.

Table 3 .
Pairwise Fst values among the Zhuang and 26 populations.

Table 4 .
Clinical annotation of very important pharmacogenomic variants with significant differences.ADR: Adverse drug reactions.